Opening Files¶
Supported File Types¶
PyMuPDF can open files other than just PDF.
The following file types are supported:
PDF XPS EPUB MOBI FB2 CBZ SVG TXT | |
JPG/JPEG, PNG, BMP, GIF, TIFF, PNM, PGM, PBM, PPM, PAM, JXR, JPX/JP2, PSD
JPG/JPEG, PNG, PNM, PGM, PBM, PPM, PAM, PSD, PS
|
How to Open a File¶
To open a file, do the following:
doc = pymupdf.open("a.pdf")
Note
The above creates a Document. The instruction doc = pymupdf.Document("a.pdf")
does exactly the same. So, open
is just a convenient alias and you can find its full API documented in that chapter.
Opening with a Wrong File Extension¶
If you have a document with a wrong file extension for its type, you can still correctly open it.
Assume that “some.file” is actually an XPS. Open it like so:
doc = pymupdf.open("some.file", filetype="xps")
Note
PyMuPDF itself does not try to determine the file type from the file contents. You are responsible for supplying the file type information in some way – either implicitly, via the file extension, or explicitly as shown with the filetype
parameter. There are pure Python packages like filetype that help you doing this. Also consult the Document chapter for a full description.
If PyMuPDF encounters a file with an unknown / missing extension, it will try to open it as a PDF. So in these cases there is no need for additional precautions. Similarly, for memory documents, you can just specify doc=pymupdf.open(stream=mem_area)
to open it as a PDF document.
If you attempt to open an unsupported file then PyMuPDF will throw a file data error.
Opening Remote Files¶
For remote files on a server (i.e. non-local files), you will need to stream the file data to PyMuPDF.
For example use the requests library as follows:
import pymupdf
import requests
r = requests.get('https://mupdf.com/docs/mupdf_explored.pdf')
data = r.content
doc = pymupdf.Document(stream=data)
Opening Files from Cloud Services¶
For further examples which deal with files held on typical cloud services please see these Cloud Interactions code snippets.
Opening Files as Text¶
PyMuPDF has the capability to open any plain text file as a document. In order to do this you should provide the filetype
parameter for the pymupdf.open
function as "txt"
.
doc = pymupdf.open("my_program.py", filetype="txt")
In this way you are able to open a variety of file types and perform the typical non-PDF specific features like text searching, text extracting and page rendering. Obviously, once you have rendered your txt
content, then saving as PDF or merging with other PDF files is no problem.
Examples¶
Opening a C#
file¶
doc = pymupdf.open("MyClass.cs", filetype="txt")
Opening an XML
file¶
doc = pymupdf.open("my_data.xml", filetype="txt")
Opening a JSON
file¶
doc = pymupdf.open("more_of_my_data.json", filetype="txt")
And so on!
As you can imagine many text based file formats can be very simply opened and interpreted by PyMuPDF. This can make data analysis and extraction for a wide range of previously unavailable files suddenly possible.